Data model type system

グラフデータベースのノードは、JSONデータタイプを使用してモデル化されています。.NET言語からのリッチなクエリをサポートするために、O/Rマッピング技術のように、これらのデータ型を「古いCLR型」に投影するサポートを構築しました。これにより、いわゆる "Bingデータモデル "が導入されました。このモデルでは、.NETタイプに属性を付与することができ、LINQのサポートを通じて、グラフ内のエンティティの読み取りと格納に使用できるようになります。

Nodes in the graph database were modeled using JSON data types. To support a rich querying experience from .NET languages, we built support to project these types onto “plain old CLR types”, akin to what object/relational mapping technologies do as well. This led to the introduction of the so-called “Bing data model”, which enables annotating .NET types with attributed to make them usable - through LINQ support - for reading and storing entities in the graph.

code:C#

class Person

{

Mapping("bing://entities/person/name")

public string Name { get; set; }

}

Mapping 属性に渡された文字列は、スタック内では不透明な識別子として扱われますが、schema.org などのオントロジーからのよく知られた URI を使用することが推奨されていました。これらの型は、クライアントからサービスに送信されるクエリ式を表す式木の中で参照されることを考えると、ユーザ型の二元的な依存関係を解消する戦略を考えなければなりませんでした。さらに、これらのスキーマがオープンエンドであること、つまり、いつでもプロパティを追加できることを保証したいと考えました。そこで、.NETプロジェクションの下で構造的な型システムを採用することになりました。

The string passed to the Mapping attribute is treated as an opaque identifier in the stack, but the recommendation was to use well-known URIs from ontologies such as schema.org. Given that these types are referenced within expression trees representing query expressions that are transmitted from the client to the service, we had to come up with a strategy to break binary dependencies on user types. On top of this, we also wanted to ensure that these schemas are open-ended, i.e. additional properties can be added at any time. This led to the adoption of a structural type system underneath the .NET projection.

構造的な型システムを「普通の」（名目上は型付けされた）.NET の型に投影して使用することは、ある意味で Oslo/M の作業に触発されています。クラウドプログラマビリティチームの「データリファイナリー」プロジェクトでは、オリジナルのSSISデザイナーの上で動作するLINQ to Everythingの分散型デモのために、同様のプロジェクションが構築されました。

The use of a structural type system projected into “plain old” (nominally typed) .NET types is in a way inspired by the work on Oslo/M. During the sunset of the Cloud Programmability Team with the “data refinery” project, a similar projection was built for the distributed LINQ to Everything demo running on top of the original SSIS designer.

.NET の型を構造的なエンティティに投影する仕組みは、型の「匿名化」と呼ばれるプロセスに基づいていました。ここで重要なことは、LINQ問い合わせ式では、グループ化キーや投影のための匿名型の使用や、LINQ問い合わせ式をよりプリミティブな操作に変換する “desugaring” フェーズで使用される透過的な識別子などの機能をサポートするために、ある程度構造的な型の概念がすでに必要であるということでした。このように、式木のシリアライズは、型の名前（名目）ではなく型の形状（構造）を捉えることで、C# 3.0およびVisual Basic 9の匿名型をすでにサポートしています。問い合わせ式でのデータモデル型のサポートは、シリアル化の前に式の書き換えフェーズを導入することで実現しました。これにより、名目上のデータモデル型は消去され、構造を保持した匿名型が使用されるようになりました。上の例では、対応する型は次のようになります。

The mechanism underneath the .NET type projection onto structural entities was based on a process called “anonymization” of types. The key realization here was that LINQ query expressions already needed somewhat structural type notions in order to support features such as the use of anonymous types for grouping keys or projections, or transparent identifiers used during the “desugaring” phase of translating LINQ query expressions to more primitive operations. As such, expression tree serialization already supported C# 3.0 and Visual Basic 9 anonymous types, by capturing the shape of the type (structural) rather than the name of the type (nominal). Support for data model types in query expressions was lit up by introducing an expression rewrite phase prior to serialization, causing nominal data model types to be nominally erased in favor of anonymous types retaining their structure. For the example show above, the corresponding type would look a bit like this:

code:C#

CompilerGenerated

class <>__AnonymousType1

{

public string bing://entities/person/name { get; set; }

}

型の名前（例：Person）とメンバーの名前（例：Name）が消されていることに注意してください。型名は完全に消え、シリアライズ形式では削除されます（つまり、<>__AnonymousType1はペイロードには存在しません）。一方、プロパティ名はMapping属性を使って適用された識別子に置き換えられます。これにより、クライアントとサービスの間の名目上のエンティティタイプの共有が事実上ゼロになりました。これは、アセンブリの共有やバージョニングの問題などを必要とせずにグラフデータベースを機能させるための前提条件でした。

Note that the names of types (e.g. Person) and members (e.g. Name) got erased. Type names completely vanish and get dropped in the serialization format (i.e. <>__AnonymousType1 doesn’t occur in the payload), while property names get substituted for the identifier applied to them using the Mapping attribute. This effectively led to zero nominal entity type sharing between client and service, which was a prerequisite to make the graph database work without requiring sharing of assemblies, versioning woes, etc.

ここで説明したデータモデルは、実装方法は異なるものの、Reaqtorでも使用されています。Nuqleon.DataModelで始まるアセンブリは、資産の継承を明らかにしています。Reaqtorでのデータモデルの使用は、エンティティの空間的/時間的なトラバーサルが最終目標である「リアクティブグラフ」のビジョンが動機となっています。永続化されたエンティティとストリーミングエンティティに同じデータモデルを使用することで、これらのプログラミングモデルで使用されるオブジェクトモデルの統一が可能になりました。これにより、世界のデータのオントロジーをサポートすることが最終的な目標となり、チームのビジョンにとって重要な要素となりました。

The data model described here is still in use by Reaqtor, albeit using a different implementation strategy. Assemblies starting with Nuqleon.DataModel reveal this inheritance of assets. Use of the data model in Reaqtor was motivated by the “reactive graph” vision where spatial/temporal traversal of entities was the ultimate goal. By using the same data model for persisted entities and streaming entities, unification of the object model used across these programming models was made possible. The ultimate goal of this was to support an ontology of the world’s data, which was a key ingredient to the team’s vision.

なお、この戦略では、個別のスキーマ・リポジトリの概念や必要性もなくなります。メタデータAPIを使用して、利用可能なデータコレクション（ノード、エッジ、ストリームなど）をデータ処理システム（グラフ、ドキュメントデータベース、ストリーム処理プラットフォームなど）に問い合わせることで、これらのコレクション内のエンティティに関する構造型情報を得ることができます。事実上、コレクションとその中のエンティティの存在は、構造型の証人として機能します。これらのメタデータAPIは、データの構造を発見するために使用することができ、より多くの情報を得るためにschema.orgのような疎結合のオントロジー・プロバイダに結合することができます。この上に重ねることができる伝統的な開発者の経験は、強力な型付け（つまり、O/Rマッピング技術に似ている）によるクライアントライブラリのコード生成です。

Note that this strategy also removes the notion of and need for a separate schema repository. By using metadata APIs to query data processing systems (such as graphs, document databases, or stream processing platforms) for available data collections (e.g. nodes, edges, streams, etc.), structural type information about the entities within these collections can be obtained. In effect, the collection and presence of entities within it acts as a witness of structural types. These metadata APIs can be used to discover the structure of data, which can be joined together to a loosely coupled ontology provider such as schema.org in order to obtain more information. A traditional developer experience that can be layered on top of this is code generation for client libraries with strong typing (i.e. similar to O/R mapping technologies).

最後に，静的型付けされた言語のクエリ式で型投影を使用することは完全に任意であることに注意してください．このような型は、豊富な型チェックとコンパイル時の検証により、編集作業の効率化を図るだけです。しかし、投影されたエンティティ型をサポートするだけでなく、 DynamicObject や JObject などの動的型を表す静的型を使用することもできます。

Finally, note that the use of type projections for use in query expressions in a statically typed language is completely optional. Such types merely provide an enhanced editing experience with rich type checking and compile-time validation. However, in addition to supporting projected entity types, one can also use static types that represent a dynamic type, such as DynamicObject or JObject.

code:C#

from person in graph.Nodes<Person>()

where person.Name == "Bill Gates"

...

versus

from person in graph.Nodes<JObject>()

where person"bing://entities/person/name" == "Bill Gates"

...

つまり、構造的かつ動的な型システムの上に、任意の静的な型付けを行うというアプローチです。静的な名目上の型を排除して構造的な型表現を行うことで、将来的にはJavaScriptやPythonなどの他の言語で、これらのデータ処理サービスのクライアントライブラリを実装することも可能になりました。

In other words, this approach provides optional static typing on top of a structural and dynamic type system underneath. By erasing static nominal types in favor of a structural type representation, this also enabled eventual implementations of client libraries for these data processing services in other languages such as JavaScript or Python.